Search CORE

17 research outputs found

Incorporating Prior Knowledge into Task Decomposition for Large-Scale Patent Classification

Author: B.L. Lu
B.L. Lu
C.J. Fall
L.S. Larkey
L.S. Larkey
M. Krier
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Abstract. With the adoption of min-max-modular support vector machines (SVMs) to solve large-scale patent classification problems, a novel, simple method for incorporating prior knowledge into task decomposition is proposed and investigated. Two kinds of prior knowledge described in patent texts are considered: time information, and hierarchical structure information. Through experiments using the NTCIR-5 Japanese patent database, patents are found to have time-varying features that considerably affect classification. The experimen-tal results demonstrate that applying min-max modular SVMs with the proposed method gives performance superior to that of conventional SVMs in terms of training time, generalization accuracy, and scalability.

CiteSeerX

Crossref

Machine Learning in Automated Text Categorization

Author: ANDROUTSOPOULOS I.
ATTARDI G.
BAKER L.D.
BIEBRICHER P.
CAROPRESO M.F.
CAVNAR W.B.
CHAKRABARTI S.
CLACK C.
CLEVERDON C.
COHEN W. W.
COHEN W. W.
COHEN W.W.
DAGAN I.
DEERWESTER S.
DENOYER L.
DIAZ ESTEBAN A.
DRUCKER H.
DUMAIS S.T.
DUMAIS S.T.
ESCUDERO G.
Fabrizio Sebastiani
FIELD B.
FORSYTH R. S.
FUHR N.
FUHR N.
FUHR N.
FURNKRANZ J.
GALAVOTTI L.
GALE W. A.
GOVERT N.
GRAY W.A.
GUTHRIE L.
HAYES P.J.
HEAPS H.
HERSH W.
HULL D. A.
HULL D. A.
ITTNER D.J.
IWAYAMA M.
IYER R.D.
JOACHIMS T.
JOACHIMS T.
JOACHIMS T.
JOHN G. H.
JUNKER M.
JUNKER M.
KESSLER B.
KIM Y.-H.
KLINKENBERG R.
KNORZ G.
KOLLER D.
LAM S.L.
LAM W.
LAM W.
LANG K.
LARKEY L. S.
LARKEY L. S.
LARKEY L.S.
LEWIS D. D.
LEWIS D. D.
LEWIS D. D.
LEWIS D. D.
LEWIS D.D.
LEWIS D.D.
LEWIS D.D.
LEWIS D.D.
LEWIS D.D.
LI H.
LI Y.H.
LIERE R.
LIM J. H.
MASAND B.
MASAND B.
MCCALLUM A. K.
MCCALLUM A.K.
MLADENIC D.
MLADENIC D.
MOULINIER I.
MOULINIER I.
MYERS K.
NG H.T.
OH H.-J.
PAZIENZA M. T.
RILOFF E.
ROBERTSON S.E.
ROBERTSON S.E.
ROTH D.
RUIZ M.E.
SABLE C.L.
SARACEVIC T.
SCHAPIRE R. E.
SCHUTZE H.
SCHUTZE H.
SCOTT S.
SEBASTIANI F.
SINGHAL A.
SLONIM N.
TAIRA H.
TUMER K.
TZERAS K.
VAN RIJSBERGEN C. J.
WIENER E.D.
YANG Y.
YANG Y.
YANG Y.
YANG Y.
YU K.L.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2001
Field of study

The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert manpower, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey

arXiv.org e-Print Archive

CiteSeerX

Crossref

An information-theoretic approach to automatic query expansion

Author: BALLERINI J.P.
BIGI B.
BRAJNIK G.
Brigitte Bigi
BUCKLEY C.
CARPINETO C.
CARPINETO C.
CARPINETO C.
Claudio Carpineto
COOPER J.W.
CROFT W.
DEERWESTER S.
DIETTERICH T.
DOSZCOCKS T. E.
EFTHIMIADIS E. N.
FITZPATRICK L.
Giovanni Romano
HARMAN D.
HARPER D.J.
HAWKING D.
HEARST M.A.
KARP D.
KATZ S.
LARKEY L.S.
MITRA M.
PONTE J.M.
PORTER M. F.
Renato de Mori
ROBERTSON S.E.
ROBERTSON S.E.
ROBERTSON S.E.
ROCCHIO J.
SALTON G.
SCHAPIRE R.E.
SINGHAL A.
VAN RIJSBERGEN C. J.
VAN RIJSBERGEN C.J.
VELEZ B.
VOORHEES E.
VOORHEES E. M.
VOORHEES E.M.
XU J.
XU J.
YANG K.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Hindi CLIR in thirty days

Author: Abduljaleel N.
Aljlayl M.
Ballestros L.
Berger A.
Chen A.
Davis M.W.
Larkey L.S.
Larkey L.S.
Leah S. Larkey
Margaret E. Connell
Nasreen Abduljaleel
NTCIR
Oard D.W.
Och F.J.
Peters C.
Pirkola A.
Ramanathan A.
Xu J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Expansion Finding for Given Acronyms Using Conditional Random Fields

Author: B. Tasker
F. Peng
K. Sato
K. Taghva
L.R. Rabiner
L.S. Larkey
V.N. Vapnik
Y. Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Crossref

Jefferson : nordisk tidskrift för Blues

Author: A. Chen
D.W. Oard
F.C. Ekmekcioglu
H.S. Mustafa
I. Al-Kharashi
J. Xu
K. Darwish
L.S. Larkey
W.B. Frakes
Publication venue
Publication date: 01/01/2010
Field of study

In languages with high word in ation such as Arabic, stemming improves text retrieval performance by reducing words variants.We propose a change in the corpus-based stemming approach proposed by Xu and Croft for English and Spanish languages in order to stem Arabic words. We generate the conflation classes by clustering 3-gram representations of the words found in only 10% of the data in the first stage. In the second stage, these clusters are refined using different similarity measures and thresholds. We conducted retrieval experiments using row data, Light-10 stemmer and 8 different variations of the similarity measures and thresholds and compared the results. The experiments show that 3-gram stemming using the dice distance for clustering and the EM similarity measure for refinement performs better than using no stemming; but slightly worse than Light-10 stemmer. Our method potentially could outperform Light-10 stemmer if more text is sampled in the first stage

Crossref

Research Online

GSI Repository

Arabic supervised learning method using N‐gram

Author: Abdelali A.
Goweder A.
Khaldoun Zreik
Khoja S.
Larkey L.S.
Lavrenko V.
Mahmoud Rammal
Majed Sanan
Mayfield J.
Publication venue: 'Emerald'
Publication date
Field of study

Crossref

Prosodic Phrasing and Comprehension

Author: Angelien A. Sanderman
LARKEY L.S.
LEE L.
LEVELT W.J. M.
PISONI D.B.
RALSTON J.V.
René Collier
TERKEN J.M. B.
Publication venue: 'SAGE Publications'
Publication date
Field of study

Crossref

Understanding the acceptance and usage of project management methodologies

Author: D. Fetterly
D.D. Lewis
F. Sebastiani
G. Forman
G. Fung
G. Salton
G. Widmer
G.H. John
I. Witten
K.V. Chandrinos
L.S. Larkey
P.Y. Lee
R.E. Schapire
Y. Yang
Publication venue: Université de Lausanne, Faculté des hautes études commerciales
Publication date: 01/01/2005
Field of study

Crossref

Serveur académique lausannois

Computing a Comprehensible Model for Spam Filtering

Author: C. Apte
C. Chen
D.H. Wolpert
H. Drucker
I. Witten
J. Quinlan
J.L. Triviño-Rodriguez
J.R. Méndez
L.S. Larkey
R. Schapire
T. Dietterich
W.W. Cohen
Y. Freund
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Crossref